AITopics | stability selection

Collaborating Authors

stability selection

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

2D Stability Selection: Design Jittering for Doubly Stable Feature Selection

Nouraie, Mahdi, Zhu, Houying, Muller, Samuel

arXiv.org Machine LearningMay-5-2026

We study feature selection in high-dimensional regression under two distinct sources of instability: sampling variability and measurement error in the design matrix. Stability Selection addresses the former through sub-sampling and aggregation, but does not explicitly stress-test robustness to noisy predictors. We introduce doubly stable feature selection, a perturb-and-aggregate framework that targets features whose inclusion is stable both across randomization and across increasing levels of design noise. The method injects controlled additive noise into the design matrix, fits a fixed base selector such as the Lasso on the perturbed data, and aggregates selection frequencies. Sweeping over a grid of noise levels yields a stability path that summarizes robustness to measurement error while using the full sample size and isolating the effect of design perturbations. On the theory side, we show that classical model-selection conditions are preserved under sufficiently small perturbations, with a high-probability extension for Gaussian noise. Empirically, experiments on synthetic and real datasets show improved robustness compared with Stability Selection and standard base selectors.

artificial intelligence, machine learning, selection, (13 more...)

arXiv.org Machine Learning

2605.02205

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)

Add feedback

A Stable Lasso

Nouraie, Mahdi, Zhu, Houying, Muller, Samuel

arXiv.org Machine LearningNov-5-2025

The Lasso has been widely used as a method for variable selection, valued for its simplicity and empirical performance. However, Lasso's selection stability deteriorates in the presence of correlated predictors. Several approaches have been developed to mitigate this limitation. In this paper, we provide a brief review of existing approaches, highlighting their limitations. We then propose a simple technique to improve the selection stability of Lasso by integrating a weighting scheme into the Lasso penalty function, where the weights are defined as an increasing function of a correlation-adjusted ranking that reflects the predictive power of predictors. Empirical evaluations on both simulated and real-world datasets demonstrate the efficacy of the proposed method. Additional numerical results demonstrate the effectiveness of the proposed approach in stabilizing other regularization-based selection methods, indicating its potential as a general-purpose solution.

artificial intelligence, lasso, machine learning, (17 more...)

arXiv.org Machine Learning

2511.02306

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Add feedback

Robust variable selection for spatial point processes observed with noise

Sturm, Dominik, Sbalzarini, Ivo F.

arXiv.org Machine LearningOct-30-2025

We propose a method for variable selection in the intensity function of spatial point processes that combines sparsity-promoting estimation with noise-robust model selection. As high-resolution spatial data becomes increasingly available through remote sensing and automated image analysis, identifying spatial covariates that influence the localization of events is crucial to understand the underlying mechanism. However, results from automated acquisition techniques are often noisy, for example due to measurement uncertainties or detection errors, which leads to spurious displacements and missed events. We study the impact of such noise on sparse point-process estimation across different models, including Poisson and Thomas processes. To improve noise robustness, we propose to use stability selection based on point-process subsampling and to incorporate a non-convex best-subset penalty to enhance model-selection performance. In extensive simulations, we demonstrate that such an approach reliably recovers true covariates under diverse noise scenarios and improves both selection accuracy and stability. We then apply the proposed method to a forestry data set, analyzing the distribution of trees in relation to elevation and soil nutrients in a tropical rain forest. This shows the practical utility of the method, which provides a systematic framework for robust variable selection in spatial point-process models under noise, without requiring additional knowledge of the process.

artificial intelligence, machine learning, selection, (18 more...)

arXiv.org Machine Learning

2510.2555

Country:

Europe > Germany (0.46)
North America > United States (0.46)

Genre: Research Report > New Finding (0.68)

Industry:

Health & Medicine (0.46)
Energy (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Stability Selection via Variable Decorrelation

Nouraie, Mahdi, Smith, Connor, Muller, Samuel

arXiv.org Machine LearningMay-28-2025

The Lasso is a prominent algorithm for variable selection. However, its instability in the presence of correlated variables in the high-dimensional setting is well-documented. Although previous research has attempted to address this issue by modifying the Lasso loss function, this paper introduces an approach that simplifies the data processed by Lasso. We propose that decorrelating variables before applying the Lasso improves the stability of variable selection regardless of the direction of correlation among predictors. Furthermore, we highlight that the irrepresentable condition, which ensures consistency for the Lasso, is satisfied after variable decorrelation under two assumptions. In addition, by noting that the instability of the Lasso is not limited to high-dimensional settings, we demonstrate the effectiveness of the proposed approach for low-dimensional data. Finally, we present empirical results that indicate the efficacy of the proposed method across different variable selection techniques, highlighting its potential for broader application. The DVS R package is developed to facilitate the implementation of the methodology proposed in this paper.

artificial intelligence, machine learning, selection, (16 more...)

arXiv.org Machine Learning

2505.20864

Country: Europe > Switzerland > Vaud > Lausanne (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

On the Selection Stability of Stability Selection and Its Applications

Nouraie, Mahdi, Muller, Samuel

arXiv.org Machine LearningNov-13-2024

Stability selection is a widely adopted resampling-based framework for high-dimensional structure estimation and variable selection. However, the concept of 'stability' is often narrowly addressed, primarily through examining selection frequencies, or 'stability paths'. This paper seeks to broaden the use of an established stability estimator to evaluate the overall stability of the stability selection framework, moving beyond single-variable analysis. We suggest that the stability estimator offers two advantages: it can serve as a reference to reflect the robustness of the outcomes obtained and help identify an optimal regularization value to improve stability. By determining this value, we aim to calibrate key stability selection parameters, namely, the decision threshold and the expected number of falsely selected variables, within established theoretical bounds. Furthermore, we explore a novel selection criterion based on this regularization value. With the asymptotic distribution of the stability estimator previously established, convergence to true stability is ensured, allowing us to observe stability trends over successive sub-samples. This approach sheds light on the required number of sub-samples addressing a notable gap in prior studies. The 'stabplot' package is developed to facilitate the use of the plots featured in this manuscript, supporting their integration into further statistical analysis and research workflows.

selection, stability, stability selection, (15 more...)

arXiv.org Machine Learning

2411.09097

Country: Europe > Switzerland > Vaud > Lausanne (0.04)

Genre: Research Report > Experimental Study (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.94)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback

Fast nonparametric feature selection with error control using integrated path stability selection

Melikechi, Omar, Dunson, David B., Miller, Jeffrey W.

arXiv.org Machine LearningOct-3-2024

Feature selection can greatly improve performance and interpretability in machine learning problems. However, existing nonparametric feature selection methods either lack theoretical error control or fail to accurately control errors in practice. Many methods are also slow, especially in high dimensions. In this paper, we introduce a general feature selection method that applies integrated path stability selection to thresholding to control false positives and the false discovery rate. The method also estimates q-values, which are better suited to high-dimensional data than p-values. We focus on two special cases of the general method based on gradient boosting (IPSSGB) and random forests (IPSSRF). Extensive simulations with RNA sequencing data show that IPSSGB and IPSSRF have better error control, detect more true positives, and are faster than existing methods. We also use both methods to detect microRNAs and genes related to ovarian cancer, finding that they make better predictions with fewer features than other methods.

ipss, selection, stability selection, (15 more...)

arXiv.org Machine Learning

2410.02208

Country:

North America > United States > North Carolina > Durham County > Durham (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Genre: Research Report > Experimental Study (0.88)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.71)

Add feedback

Integrated path stability selection

Melikechi, Omar, Miller, Jeffrey W.

arXiv.org Machine LearningMar-23-2024

Stability selection is a widely used method for improving the performance of feature selection algorithms. However, stability selection has been found to be highly conservative, resulting in low sensitivity. Further, the theoretical bound on the expected number of false positives, E(FP), is relatively loose, making it difficult to know how many false positives to expect in practice. In this paper, we introduce a novel method for stability selection based on integrating the stability paths rather than maximizing over them. This yields a tighter bound on E(FP), resulting in a feature selection criterion that has higher sensitivity in practice and is better calibrated in terms of matching the target E(FP). Our proposed method requires the same amount of computation as the original stability selection algorithm, and only requires the user to specify one input parameter, a target value for E(FP). We provide theoretical bounds on performance, and demonstrate the method on simulations and real data from cancer gene expression studies.

ipss, selection, stability selection, (13 more...)

arXiv.org Machine Learning

2403.15877

Country: North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)

Add feedback

Cluster Stability Selection

Faletto, Gregory, Bien, Jacob

arXiv.org Machine LearningJan-3-2022

Stability selection (Meinshausen and Buhlmann, 2010) makes any feature selection method more stable by returning only those features that are consistently selected across many subsamples. We prove (in what is, to our knowledge, the first result of its kind) that for data containing highly correlated proxies for an important latent variable, the lasso typically selects one proxy, yet stability selection with the lasso can fail to select any proxy, leading to worse predictive performance than the lasso alone. We introduce cluster stability selection, which exploits the practitioner's knowledge that highly correlated clusters exist in the data, resulting in better feature rankings than stability selection in this setting. We consider several feature-combination approaches, including taking a weighted average of the features in each important cluster where weights are determined by the frequency with which cluster members are selected, which we show leads to better predictive models than previous proposals. We present generalizations of theoretical guarantees from Meinshausen and Buhlmann (2010) and Shah and Samworth (2012) to show that cluster stability selection retains the same guarantees. In summary, cluster stability selection enjoys the best of both worlds, yielding a sparse selected set that is both stable and has good predictive performance.

cluster stability selection, selection, stability selection, (14 more...)

arXiv.org Machine Learning

2201.00494

Country:

North America > United States > California (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Biomedical Informatics (0.92)

Add feedback

Employing an Adjusted Stability Measure for Multi-Criteria Model Fitting on Data Sets with Similar Features

Bommert, Andrea, Rahnenführer, Jörg, Lang, Michel

arXiv.org Machine LearningJun-15-2021

Fitting models with high predictive accuracy that include all relevant but no irrelevant or redundant features is a challenging task on data sets with similar (e.g. highly correlated) features. We propose the approach of tuning the hyperparameters of a predictive model in a multi-criteria fashion with respect to predictive accuracy and feature selection stability. We evaluate this approach based on both simulated and real data sets and we compare it to the standard approach of single-criteria tuning of the hyperparameters as well as to the state-of-the-art technique "stability selection". We conclude that our approach achieves the same or better predictive performance compared to the two established approaches. Considering the stability during tuning does not decrease the predictive accuracy of the resulting models. Our approach succeeds at selecting the relevant features while avoiding irrelevant or redundant features. The single-criteria approach fails at avoiding irrelevant or redundant features and the stability selection approach fails at selecting enough relevant features for achieving acceptable predictive accuracy. For our approach, for data sets with many similar features, the feature selection stability must be evaluated with an adjusted stability measure, that is, a measure that considers similarities between features. For data sets with only few similar features, an unadjusted stability measure suffices and is faster to compute.

accuracy, predictive accuracy, similar feature, (14 more...)

arXiv.org Machine Learning

2106.08105

Country:

Oceania > New Zealand > North Island > Waikato > Hamilton (0.04)
Europe > Middle East > Cyprus > Nicosia > Nicosia (0.04)
Europe > Germany > North Rhine-Westphalia > Arnsberg Region > Dortmund (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

View selection in multi-view stacking: Choosing the meta-learner

van Loon, Wouter, Fokkema, Marjolein, Szabo, Botond, de Rooij, Mark

arXiv.org Machine LearningOct-30-2020

Multi-view stacking is a framework for combining information from different views (i.e. different feature sets) describing the same set of objects. In this framework, a base-learner algorithm is trained on each view separately, and their predictions are then combined by a meta-learner algorithm. In a previous study, stacked penalized logistic regression, a special case of multi-view stacking, has been shown to be useful in identifying which views are most important for prediction. In this article we expand this research by considering seven different algorithms to use as the meta-learner, and evaluating their view selection and classification performance in simulations and two applications on real gene-expression data sets. Our results suggest that if both view selection and classification accuracy are important to the research at hand, then the nonnegative lasso, nonnegative adaptive lasso and nonnegative elastic net are suitable meta-learners. Exactly which among these three is to be preferred depends on the research context. The remaining four meta-learners, namely nonnegative ridge regression, nonnegative forward selection, stability selection and the interpolating predictor, show little advantages in order to be preferred over the other three.

artificial intelligence, machine learning, selection, (15 more...)

arXiv.org Machine Learning

2010.16271

Country:

Europe > Austria > Vienna (0.14)
North America > United States > New York > New York County > New York City (0.04)
Europe > Netherlands > South Holland > Leiden (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback